arXiv: 1508.01171 v2 [cs.DB] 28 Jul 2016 


l 


Meta-MapReduce 
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Abstract —MapReduce has proven to be one of the most useful paradigms in the revolution of distributed computing, where cloud 
services and cluster computing become the standard venue for computing. The federation of cloud and big data activities is the next 
challenge where MapReduce should be modified to avoid (big) data migration across remote (cloud) sites. This is exactly our scope of 
research, where only the very essential data for obtaining the result is transmitted, reducing communication, processing and preserving 
data privacy as much as possible. In this work, we propose an algorithmic technique for MapReduce algorithms, called 
Meta-MapReduce, that decreases the communication cost by allowing us to process and move metadata to clouds and from the map 
phase to reduce phase. In Meta-MapReduce, the reduce phase fetches only the required data at required iterations, which in turn, 
assists in preserving the data privacy. 

Index Terms —MapReduce algorithms, distributed computing, locality of data and computations, mapping schema, and reducer 
capacity. 
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1 Introduction 

MapReduce |1] is a programming system used for par¬ 
allel processing of large-scale data. The given input data 
is processed by the map phase that applies a user-defined 
map function to each input and produces intermediate data 
(of the form (key, value)). Afterwards, intermediate data is 
processed by the reduce phase that applies a user-defined 
reduce function to keys and all their associated values. The 
final output is provided by the reduce phase. A detailed 
description of MapReduce can be found in Chapter 2 of |2j. 
Mappers, Reducers, and Reducer Capacity. A mapper is an 
application of a map function to a single input. A reducer 
is an application of a reduce function to a single key and 
its associated list of values. The reducer capacity ^3j — an 
important parameter — is an upper bound on the sum of the 
sizes of the value s that are sent to the reducer. For example, 
the reducer capacity may be the size of the main memory 
of the processors on which the reducers run. The capacity 
of a reducer is denoted by q, and all the reducers have an 
identical capacity. 
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1.1 Locality of Data and Communication Cost 

Input data to a MapReduce job, on one hand, may exist at 
the same site where mappers and reducers reside. However, 
ensuring an identical location of data and mappers-reducers 
cannot always be guaranteed. On the other hand, it may be 
possible that a user has a single local machine and wants to 
enlist a public cloud to help data processing. Consequently, 
in both the cases, it is required to move data to the location 
of mappers-reducers. Interested readers may see examples 
of MapReduce computations where the locations of data 
and mappers-reducers are different in 0- A review 
on geographically distributed data processing frameworks 
based on MapReduce may be found in [6|. 

In order to motivate and demonstrate the impact of dif¬ 
ferent locations of data and mappers-reducers, we consider 
two real examples, as follows: 

Amazon Elastic MapReduce Amazon Elastic MapReduce 
(EM Reprocesses data that is stored in Amazon Simple 
Storage Service (S3Q where the locations of EMR and 
S3 are not identical. Hence, it is required to move data 
from S3 to the location of EMR. However, moving the 
whole dataset from S3 to EMR is not efficient if only 
small specific part of it is needed for the final output. 
G-Hadoop and Hierarchical MapReduce Two new imple¬ 
mentations of MapReduce, G-Hadoop |5j and Hierar¬ 
chical MapReduce j7l, perform MapReduce computa¬ 
tions over geographically distributed clusters. In these 
new implementations, several clusters execute an as¬ 
signed MapReduce job in parallel, and provide partial 
outputs. Note that the output of a cluster is not the 
final output of a MapReduce job, and the final output is 
produced by processing partial outputs of all the clus- 

1. http: / /aws.amazon.com/elasticmapreduce/ 

2. http://aws.amazon.com/s3/ 
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ters at a single cluster. Thus, inter-cluster data transfer 
— transmission of partial outputs of all the clusters to 
a single cluster — is required for producing the final 
output, as the location of the partial outputs of all 
the clusters and the location of the final computation 
are not identical. However, moving the whole partial 
outputs of all the clusters to a single cluster is also not 
efficient if only small portion of the clusters' outputs is 
needed to compute the final output. 



Fig. 1: Different locations of MapReduce clusters in Hierar¬ 
chical MapReduce. 

Hierarchical MapReduce is depicted in Figure [lj where 
three clusters process data using MapReduce in par¬ 
allel, and the output of all three clusters is required 
to be sent to one of the clusters or another cluster, 
which executes a global reducer for providing the final 
output. In Figure]!] it is clear that the locations of partial 
outputs of all the clusters and the location of the global 
reducers is not identical; hence, partial outputs of all the 
clusters are required to be transferred to the location of 
the global reducer. 

Communication cost. The communication cost dominates the 
performance of a MapReduce algorithm and is the sum 
of the amount of data that is required to move from the 
location of users or data ( e.g ., S3) to the location of mappers 
( e.g ., EMR) and from the map phase to the reduce phase in 
each round of a MapReduce job. For example, in Figure [lj 
the communication cost is the sum of the total amount of 
data that is transferred from mappers to reducers in each 
cluster and from each cluster to the site of the global reducer. 

In this paper, we are interested in minimizing the data 
transferred in order to avoid communication and memory 
overhead, as well as to protect data privacy as much as 
possible. In MapReduce, we transfer inputs to the site of 
mappers-reducers from the site of the user, and then, several 
copies of inputs from the map phase to the reduce phase in 
each iteration, regardless of their involvement in the final 
output. If few inputs are required to compute the final 
output, then it is not communication efficient to move all 
the inputs to the site of mappers-reducers, and then, the 
copies of same inputs to the reduce phase. 

There are some works that consider the location of 
data 181, |9| in a restrictive manner and some works 1101, 
©' | |l 2 j that consider data movement from the map phase 
to the reduce phase. We enhance the model suggested 
in (12) and suggest an algorithmic technique for MapReduce 
algorithms to decrease the communication cost by moving 
only relevant input data to the site of mappers-reducers. 


Specifically, we move metadata of each input instead of the 
actual data, execute MapReduce algorithms on the meta¬ 
data, and only then fetch the actual required data needed to 
compute the final output. 

1.2 Motivating Examples 

We present two examples (equijoin and entity resolution) to 
show the impact of different locations of data and mappers- 
reducers on the communication cost involved in a MapRe¬ 
duce job. 

Equijoin of two relations X(A,B) and Y(B,C). Problem 
statement: The join of relations X ( A , B) and Y(B, C ), where 
the joining attribute is B, provides output tuples (a, 6, c), 
where (a, b) is in A and (b, c) is in C. In the equijoin 
of X(A, B) and Y(IP C), all tuples of both the relations 
with an identical value of the attribute B should appear 
together at the same reducer for providing the final output 
tuples. In short, a mapper takes a single tuple from X(A, B) 
or Y(B,C) and provides (B,X(A)) or ( B,Y(C )) as key- 
value pairs. A reducer joins the assigned tuples who have 
an identical key. In Figure [5] two relations X(A,B ) and 
Y(B,C ) are shown, and we consider that the size of all the 
B values is very small as compared to the size of values of the 
attributes A and C. 
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Fig. 2: Equijoin of relations X(A, B ) and Y(IP C). 

Communication cost analysis: We now investigate the im¬ 
pact of different locations of the relations and mappers- 
reducers on the communication cost. In Figure [2j the com¬ 
munication cost for joining of the relations A" and Y — 
where X and Y are located at two different clouds and 
equijoin is performed on a third cloud — is the sum of the 
sizes of all three tuples of each relation that are required 
to move from the location of the user to the location of 
mappers, and then, from the map phase to the reduce phase. 
Consider that each tuple is of unit size, and hence, the total 
communication cost is 12 for obtaining the final output. 

However, if there are a few tuples having an identical 
/j-value in both the relations, then it is useless to move the 
whole relations from the user's location to the location of 
mappers, and then, tuples from the map phase to the reduce 
phase. In Figure[2] two tuples of X and two tuples of Y have 
a common B value (i.e., h \). Hence, it is not efficient to send 
tuples having values bn and 63 , and by not sending tuples 
with B values bn and 63 , we can reduce the communication 
cost. 

Entity resolution. Problem statement: Entity resolution using 
MapReduce is suggested in (12). A solution to the entity 
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resolution problem provides disjoint subsets of records, 
where records match other records if they pass a similarity 
function, and these records belong to a single entity (or 
person). For example, if voter-card, passport, student-id, 
driving-license, and phone numbers of several people are 
given, a solution to the entity resolution problem makes 
several subsets, where a subset corresponding to a person 
holds their voter-card, passport, student-id, driving-license, 
and phone numbers. 

Communication cost analysis: The authors [12] provided a 
solution to the entity resolution problem with decreased the 
communication cost between the map phase and the reduce 
phase, by not sending the record of a person to a reducer, if 
the person has only a single record. However, in this model 
for every pair of records that share a reducer, a copy of one 
is transferred to the site of reducers, and in this manner 
the communication cost is "b^- 1 ) , jf n recorc js are assigned 
to a reducer. Note that by using the proposed technique of 
this paper, the communication cost for the entity resolution 
problem will be n, if n records are assigned to a reducer. 

1.3 Problem Statement and Our Contribution 

We are interested in reducing the amount of data to be 
transferred to the site of the cloud executing MapReduce 
computations and the amount of data transferred from the 
map phase to the reduce phase. From the preceding two 
examples, it is clear that in several problems, the final output 
depends on some inputs, and in those cases, it is not required 
to send the whole input data to the site of mappers and then 
(intermediate output) data from the map phase to the reduce 
phase. Specifically, we consider two scenarios for reducing 
the communication cost, where: ( i ) the locations of data and 
mappers-reducers are different, and (it) the locations of data 
and mappers are identical. Note that in the first case, data is 
required to move from the user's site to the computation site 
and then from the map phase to the reducer phase, while in 
the second case, data is transferred only from the map phase 
to the reduce phase. 

In addition to the locations of data and computations, we 
are also considering the number of iterations involved in a 
MapReduce job and the reducer capacity (i.e., the maximum 
size of inputs that can be assigned to a reducer). 

Our contributions. In this paper, we provide the following: 

• An algorithmic approach for MapReduce algorithms. 
We provide a new algorithmic approach for MapReduce 
algorithms, Meta-MapReduce (Section |3|, that decreases 
the communication cost significantly. Meta-MapReduce 
regards the locality of data and mappers-reducers and 
avoids the movement of data that does not participate 
in the final output. Particularly, Meta-MapReduce pro¬ 
vides a way to compute the desired output using 
metadatc“0 (which is much smaller than the original 
input data) and avoids to upload the whole data (ei¬ 
ther because it takes too long or for privacy reasons). 
It should be noted that we are enhancing MapReduce 
and not creating entirely a new framework for large- 
scale data processing; thus, Meta-MapReduce is imyle- 

3. The term metadata is used in a different manner, and it represents 
a small subset, which varies according to tasks, of the dataset. 


mentable in the state-of-the art MapReduce systems such 
as Spark Pregel 1141, or modem Hadoop. 

• Data-privacy in MapReduce computations. Meta- 
MapReduce also allows us to protect data privacy as 
much as possible in the case of an honest-but-curious 
adversary by not sending all the inputs. For example, in 
the case of equijoin, processing tuple (a,, b,) of a relation 
X and ( bj . c,) of a relation Y based on metadata does not 
reveal the actual tuple information until it is required at 
the cloud. It should be noted that the relations X and 
Y can deduce that the relations Y and X have no tuple 
with value b, and b :l , respectively. However, the outcome 
of both relations do not imply the actual value of a,, bi, 
and Cj. 

Nevertheless, by the auditing process, a malicious adver¬ 
sary can be detected. Moreover, in some settings audit¬ 
ing enforces participants to be honest-but-curious rather 
than malicious, as malicious actions can be audited, 
discovered, and imply punishing actions. 

« Other applications. 

- Integrate Meta-MapReduce for processing geograph¬ 
ically distributed data (Section pTT) and for processing 
multi-round MapReduce jobs (Section [43) . 

- We show versatile applicability of Meta-MapReduce 
by performing equijoin, fc-nearest-neighbors, and 
shortest path findings on a social networking 
graph using Meta-MapReduce and show how Meta- 
MapReduce decreases the communication cost (Sec¬ 
tions [3d] and [5). 

1.4 Related Work 

MapReduce was introduced by Dean and Ghemawat in 
2004 |jlj. ComMapReduce (ll) decreases communication 
between the map and reduce phases (for some problems) 
by allowing mappers to inform a coordinator about their 
outputs, and the coordinator finds those outputs of mappers 
that will not participate in the final output. The coordinator 
informs reducers not to fetch those (undesired) outputs 
of the map phase. However, the model does not allow 
computation over metadata for discovering the actual data 
needed. Thus, in their limited scope of data filtering, there 
is no need to consider, the number iterations, the reducer 
capacity, and the locality of data. 

Another framework, Purlieus |8|, considers the location 
of the data only and tries to deploy mappers and reducers 
near the location of data. In ] 12), a model is given for 
computing records that belong to a single person. The 
model decreases the communication cost by not sending 
the record of a person to the reduce phase, if the person 
has only a single record. However, the model does not 
consider the reducer capacity and different locations of data 
and mappers-reducers. Afrati et al. (3) presents a model 
regarding the reducer capacity to find a lower bound on 
communication cost as a function of the total size of inputs 
sent to a reducer. The model considers a problem where an 
output depends on exactly two inputs. However, the model 
assumes an identical location of data and mappers-reducers. 

In summary, all the existing models provides a number 
of ways that MapReduce algorithms can be sped up if we 
are running on a platform that allows certain operations. 
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However, these models (n), 0 cannot be imple¬ 

mented for reducing the communication cost in the case 
of geographically distributed MapReduce 0.0 and for 
preserving data privacy. Moreover, all the models require 
data movement from the location of data to the location 
of computations and between the map and reduce phases 
irrespective of their involvement in outputs and the number 
of iterations of MapReduce jobs, resulting in a huge commu¬ 
nication cost. In this paper, we explore hashing and filtering 
techniques for reducing the communication cost and try to 
move only relevant data to the site of mappers-reducers. The 
ability to use these algorithms depends on the capability 
of the platform, but many systems today, such as Pregel, 
Spark, or recent implementations of MapReduce offer the 
necessary capabilities. 

Bin-packing-based approximation algorithms. We will use 
bin-packing-based approximation algorithms |g for assign¬ 
ing inputs at reducers. These algorithms are designed for 
two classes of problems, which cover lots of MapReduce- 
based problems. The first problem deals with assigning each 
pair of inputs to at least one reducer in common, without 
exceeding q. The second problem deals with assign a pair of 
inputs to at least one reducer in common, without exceeding 
q, where two inputs belong to two different sets of inputs. 
The bin-packing-based algorithm works as follows: use a 
known bin packing algorithm to place given m inputs of 
size at most f to bins of size '!. Assume that we need x bins 

k k 

to place m inputs. Now, each of these bins is considered as 
a single input of size % for our problem, and assign pair of 
these bins to reducers. 

2 The System Setting 

The system setting is an extension of the standard setting |3], 
where we consider, for the first time, the locations of data 
and mappers-reducers and the communication cost. The 
setting is suitable for a variety of problems where at least 
two inputs are required to produce an output. In order to 
produce an output, we need to define the term mapping 
schema, as follows: 

Mapping Schema. A mapping schema is an assignment of 
the set of inputs (i.e., outputs of the map phase) to some 
given reducers under the following two constraints: 

• A reducer is assigned inputs whose sum of the sizes is 
less than or equal to the reducer capacity q. 

• For each output produced by reducers, we must assign 
the corresponding inputs to at least one reducer in com¬ 
mon. 

For example, a mapping schema for equijoin example will 
assign all tuples (of relations X and Y) having an identical 
key to a reducer such that the size of assigned tuples is not 
more than q. 

The Model. The model is simple but powerful and assumes 
the following: 

1) Existence of systems such as Spark, Pregel, or modem 
Hadoop. 

2) A preliminary step at the user site who owns the dataset 
for finding metadatcQthat has smaller memory size than 
the original data. 

4. The selection of metadata depends on the problem. 


3) Approximation algorithms (given in |3J), which are 
based on a bin-packing algorithm, at the cloud or the 
global reducer in case of Hierarchical MapReduce 0. 
The approximation algorithms assign outputs of the map 
phase to reducers while regarding the reducer capacity. 
Particularly, in our case, approximation algorithms will 
assign metadata to reducers in such a manner that the 
size of actual data at a reducer will not exceed than 
the reducer capacity and all the inputs that are required 
to produce outputs must be assign at one reducer in 
common. 

In the next section, we present a new algorithmic tech¬ 
nique for MapReduce algorithms, where we try to minimize 
the communication cost regarding different locations of data 
and mappers-reducers with the help of a running example 
of equijoin. 


3 Meta-MapReduce 

We present our algorithmic technique that reduces the 
communication cost for a variety of problems, e.g., join 
of relations, fc-nea rest-neighbors finding, similarity-search, 
and matrix multiplication. The proposed technique regards 
locality of data, the number of iterations involved in a 
MapReduce job, and the reducer capacity. The idea behind 
the proposed technique is to process metadata at mappers 
and reducers, and process the original required data at 
required iterations of a MapReduce job at reducers. In this 
manner, we suggest to process metadata at mappers and 
reducers at all the iterations of a MapReduce job. Therefore, 
the proposed technique is called Meta-MapReduce. 

Before going into detail of Meta-MapReduce, we need 
to redefine the communication cost to takes into account 
the size of the metadata, the amount of the (required) 
original data, which is required to transfer to reducers only 
at required iterations, and different locations of data and 
computations. 


The communication cost for metadata and data. In the 

context of Meta-MapReduce, the communication cost is the 
sum of the following: 

Metadata cost The amount of metadata that is required 
to move from the location of users to the location 
of mappers (if the locations of data and mappers are 
different) and from the map phase to the reduce phase 
in each iteration of MapReduce job. 

Data cost The amount of required original data that is 
needed to move to reducers at required iterations of 
a MapReduce job. 

In the next Section 13.11 we explain the way Meta- 
MapReduce works, using an example of equijoin for a case 
of different locations of data and mappers. Following the ex¬ 
ample of equijoin, we also show how much communication 
cost is reduced due to the use of Mata-MapReduce. 


3.1 Meta-MapReduce Working 

In the standard MapReduce, users send their data to the site 
of the mappers before the computation begins. However, in 
Meta-MapReduce, users send metadata to the site of map¬ 
pers, instead of original data, see Figure^ Now, mappers 
and reducers work on metadata, and at required iterations 
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Fig. 3: Meta-MapReduce algorithmic approach. 


of a MapReduce job, reducers call required original data 
from the site of users (according to assigned (key, value) 
pairs) and provide the desired result. We present a detailed 
execution to demonstrate Meta-MapReduce (see Figure [3), 
using the equijoin task, as follows: 

STEP 1 Users create a master process that creates map tasks 
and reduce tasks at different compute nodes. A compute 
node that processes the map task is called a map worker, 
and a compute node that processes the reducer task is 
called a reduce worker. 

STEP 2 Users send metadata, which varies according to an 
assigned MapReduce job, to the site of mappers. Also, 
the user creates an index, which varies according to the 
assigned job, on the entire database. 

For example, in the case of equijoin (see Figure[2j, a user 
sends metadata for each of the tuples of the relations 
X(A,B ) and Y(B,C) to the site of mappers. In this 
example, metadata for a tuple i ((at,bi ), where a* and 
bi are values of the attributes A and B, respectively) 
of the relation X includes the size of all non-joining 
values (i.e., a, pj and the value of bi. Similarly, metadata 
for a tuple i ({bi,d ), where bi and d are values of 
the attributes B and C, respectively) of the relation Y 
includes the size of all non-joining values (i.e., |cj|) with 
bi (remember that the size of bi is much smaller than the 
size of ai or In addition, the user creates an index 
on the attribute B of both the relations X and Y. 

STEP 3 In the map phase, a mapper processes an assigned 
input and provides some number of (key, value) pairs, 
which are known as intermediate outputs, a value is the 
size of the corresponding input data (which is included 
in metadata). The master process is then informed of the 
location of intermediate outputs. 

For example, in case of equijoin, a mapper takes a single 
tuple i (e.g., (|aj|,6j)) and generates some (bi, value) 
pairs, where bi is a key and a value is the size of tuple i 
(i.e., |aj|). Note that in the original equijoin example, a 

5. The notation |a; | refers to the size of an input a,i. 


value is the whole data associated with the tuple i (i.e., 

Cli). 

STEP 4 The master process assigns reduce tasks (by following 
a mapping schema as suggested in |3j) and provides 
information of intermediate outputs, which serve as 
inputs to reduce tasks. A reducer is then assigned all the 
(key, value) pairs having an identical key by following 
a mapping schema for an assigned job. Now, reducers 
perform the computation and cal#] only required data 
if there is only one iteration of a MapReduce job. On 
the other hand, if a MapReduce job involves more 
than one iteration, then reducers call original required 
data at required iterations of the job (we will discuss 
multi-round MapReduce jobs using Meta-MapReduce 
in Section [43) . 

For example, in case of equijoin, a reducer receives all 
the (bi, value) pairs from both the relations X and Y, 
where a value is the size of tuple associated with key 6 ,;. 
Inputs (i.e., intermediate outputs of the map phase) are 
assigned to reducers by following a mapping schema 
for equijoin such that a reducer does not assign more 
original inputs than its capacity, and after that reducers 
invoke the call operation. Note that a reducer that 
receives at least one tuple with key bi from both the re¬ 
lations X and Y produces outputs and requires original 
input data from the user's site. However, if the reducer 
receives tuples with key 6 ,; from a single relation only, 
the reducer does not request for the original input tuple, 
since these tuples do not participate in the final output. 
Following Meta-MapReduce, we now compute the commu¬ 
nication cost involved in equijoin example (see Figure [2j. 
Recall that without using Meta-MapReduce, a solution to 
equijoin problem (in Figure[2j requires 12 units communica¬ 
tion cost. However, using Meta-MapReduce for performing 
equijoin, there is no need to send the tuple ( 03 , 62 ) of the 
relation X and the tuples (63,03) of the relation Y to the 
location of computation. Moreover, we send metadata of all 

6. The call operation will be explained in Section 3.2 
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the tuples to the site of mappers, and intermediate outputs 
containing metadata are transferred to the reduce phase, 
where reducers call only desired tuples having bi value 
from the user's site. Consequently, a solution to the problem 
of equijoin has only 4 units cost plus a constant cost for 
moving metadata using Meta-MapReduce, instead of 12 
units communication cost. 


Theorem 1 (The communication cost) Using Meta- 

MapReduce, the communication cost for the problem of join 
of two relations is at most 2 nc + h(c + w) bits, where n is 
the number of tuples in each relation, c is the maximum size of 
a value of the joining attribute, h is the number of tuples that 
actually join, and w is the maximum required memory for a tuple. 

Proof. Since the maximum size of a value of the joining 
attribute, which works as a metadata in the problem of join, 
is c and there are n tuples in each relation, users have to send 
at most 2 nc bits to the site of mappers-reducers. Further, 
tuples that join at the reduce phase have to be transferred 
from the map phase to the reduce phase and then from the 
user's site to the reduce phase. Since there are at most h 
tuples join and the maximum size of a tuple is w, we need 
to transfer at most he and at most hw bits from the map 
phase to the reduce phase and from the user's site to the 
reduce phase, respectively. Hence, the communication cost 
is at most 2 nc + h(c + wj bits. ■ 

Further significant improvement. We note that it is 
possible to further decrease the communication cost by 
using two iterations of a MapReduce job, in which the first 
iteration is performed on metadata and the second iteration 
is performed on the required original data. Specifically, in 
the first iteration, a user sends metadata to some reducers 
such that reducers are not assigned more metadata than 
capacity, and all these reducers will compute the required 
original data and the optimal number of reducers for a task. 
Afterward, a new MapReduce iteration is executed on the 
required original data such that a reducer is not assigned 
more original inputs than its capacity. In this manner, we 
save replication of metadata and some reducers that do not 
produce outputs. 


3.2 The call Function 

In this section, we will describe the call function that is 
invoked by reducers to have the original required inputs 
from the user's site to produce outputs. 

All the reducers that produce outputs require the orig¬ 
inal inputs from the site of users. Reducers can know 
whether they produce outputs or not, after receiving inter¬ 
mediate outputs from the map phase, and then, inform the 
corresponding mappers from where they have fetched these 
intermediate outputs (for simplicity, we can say all reducers 
that will produce outputs send 1 to all the corresponding 
mappers to request the original inputs, otherwise send 0). 
Mappers collect requests for the original inputs from all the 
reducers and fetch the original inputs, if required, from the 
user's site by accessing the index file. Remember that in 
Meta-MapReduce, the user creates an index on the entire 
database according to an assigned job, refer to Step 2 in 
Section 13.11 This index helps to access required data that 
reducers want without doing a scan operation. Note that 


the call function can be easily implemented on recent 
implementations of MapReduce, e.g., Pregel and Spark. 

For example, we can consider our running example of 
equijoin. In case of equijoin, a reducer that receives at least 
one tuple with key bi from both the relations X(A, B) and 
Y(B, C ) requires the original input from the user's site, and 
hence, the reducer sends 1 to the corresponding mappers. 
However, if the reducer receives tuples with key bi from 
a single relation only, the reducer sends 0. Consider that 
the reducer receives (bi, |ai|) of the relation X and (bi, |cj|) 
of the relation Y. The reducer sends 1 to corresponding 
mappers that produced (&j,|aj|) and (bi,\ci\) pairs. On 
receiving requests for the original inputs from the reducer, 
the mappers access the index file to fetch a ir bi, and c,;, and 
then, the mapper provides a,, bi, and c, to the reducer. 

3.3 Meta-MapReduce for Skewed Values of the Joining 
Attribute 

Consider two relations X(A,B) and Y(B,C), where the 
joining attribute is B and the size of all the B values is very 
small as compared to the size of values of the attributes A 
and C. One or both of the relations X and Y may have a 
large number of tuples with an identical B-value. A value 
of the joining attribute B that occurs many times is known 
as a heavy hitter. In skew join of X(A,B) and Y(B,C), all 
the tuples of both the relations with an identical heavy hitter 
should appear together to provide the output tuples. 

In Figure |4] b \ is a heavy hitter; hence, it is required that 
all the tuples of X(A, B) and Y(B, C ) with the heavy hitter, 
b\, should appear together to provide the output tuples, 
(a,bi,c) (a £ A, bi £ B,c £ C), which depend on exactly 
two inputs. However, due to a single reducer — for joining 
all tuples with a heavy hitter — there is no parallelism at 
the reduce phase, and a single reducer takes a long time to 
produce all the output tuples of the heavy hitter. 



Fig. 4: Skew join example for a heavy hitter, b \. 


We can restrict reducers in a way that they can hold 
many tuples, but not all the tuples with the heavy-hitter- 
value. In this case, we can reduce the time and use more 
reducers, which results in a higher level of parallelism at 
the reduce phase. But, there is a higher communication cost, 
since each tuple with the heavy hitter must be sent to more 
than one reducer. (Details about join algorithms may be 
found in [15].) 

We can solve the problem of skew join using Meta- 


MapReduce, using four steps suggested in Section 3.1 


Theorem 2 (The communication cost) Using Meta- 

MapReduce, the communication cost for the problem of 
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Problems 

Section 

Theorem 

Communication cost 

using Meta-MapReduce 

using MapReduce 

Join of two relations 


3.1 



1 


2 nc + h(c + w) 

4 nw 

Skewed Values of the Joining Attribute 


3.3 



2 


2 nc + rh(c + w) 

2nw(l + r) 

Join of two relations by hashing the joining attribute 


4.2 



3 


6n ■ log m + h(c + w) 

4 nw 

Join of k relations by hashing the joining attributes 


4.3 



4 


3 knp ■ log m + h(c + w) 

2 knw 


n: the number of tuples in each relation, c: the maximum size of a value of the joining attribute, r: the replication rate, h: the number of tuples 
that actually join, w is the maximum required memory for a tuple, p: the maximum number of dominating attributes in a relation, and m: the 
maximal number of tuples in all given relations. 


TABLE 1: The communication cost for joining of relations using Meta-MapReduce. 


skew join of two relations is at most 2 nc + rh(c + w ) bits, where 
n is the number of tuples in each relation, c is the maximum size 
of a value of the joining attribute, r is the replication rafej^j h 
is the number of distinct tuples that actually join, and w is the 
maximum required memory for a tuple. 

Proof. From the user's site to the site of mappers-reducers, 
at most 2 nc bits are required to move (according to Theo¬ 
rem [lj. Since at most h distinct tuples join and these tuples 
are replicated to r reducers, at most rhc bits are required to 
transfer from the map phase to the reduce phase. Further, 
h tuples of size at most w to be transferred from the map 
phase to the reduce phase, and hence, at most rhw bits are 
assigned to reducers. Thus, the communication cost is at 
most 2 nc + rh(c + w) bits. ■ 

3.4 Meta-MapReduce for an Identical Location of Data 
and Mappers 

We explained the way Meta-MapReduce acts in the case of 
different locations of data and computation, and show how 
it provides desired outputs by considering only metadata of 
inputs. Nevertheless, Meta-MapReduce also decreases the 
amount of data to be transferred when the locations of 
data and computations are identical. In this case, mappers 
process only metadata of assigned inputs instead of the 
original inputs as in MapReduce, and provide (key, value) 
pairs, where a value is the size of an assigned input, not the 
original input itself. A reducer processes all the (key, value) 
pairs having an identical key and calls the original input 
data, if required. Consequently, there is no need to send all 
those inputs that do not participate in the final result from 
the map phase to the reduce phase. 

For example, in case of equijoin, if the location of map¬ 
pers and relations X and Y are identical, then a mapper 
processes a tuple of either relation and provides (bi, |oj|) 
or (bi, |cj|) as outputs. A reducer is assigned all the inputs 
having an identical key, and the reducer calls the original 
inputs if it has received inputs from both the relations. 


for handling large size of values of joining attributes, and 
for handling multi-round iterations. In this section, we will 
provide three extensions of Meta-MapReduce. 

4.1 Incorporating Meta-MapReduce in G-Hadoop and 
Hierarchical MapReduce 

G-Hadoop and Hierarchical MapReduce are two imple¬ 
mentations for geographically distributed data processing 
using MapReduce. Both the implementations assume that 
a cluster processes data using MapReduce and provides its 
outputs to one of the clusters that provides final outputs (by 
executing a MapReduce job on the received outputs of all 
the clusters). However, the transmission of outputs of all the 
clusters to a single cluster for producing the final output is 
not efficient, if all the outputs of a cluster do not participate 
in the final output. 

We can apply Meta-MapReduce idea to systems such as 
G-Hadoop and Hierarchical MapReduce. Note that we do 
not change basic functionality of both the implementations. 
We take our running example of equijoin (see Figure |5j 
where we have three clusters, possibly on three continents, 
the first cluster has two relations U(A, B) and V(B, C), the 
second cluster has two relations W(D, B) and X(B, E ), and 
the third cluster has two relations Y(F,B) and Z(B,G)) 
and assume that data exist at the site of mappers in each 
cluster. In the final output, reducers perform the join op¬ 
eration over all the six relations, which share an identical 
/1-value. 
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b i 

b i 

8 i 
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b 5 

b \ 

82 

f. 

b 6 

b i 

82 


Cluster 3 

Possibly on continent 3 


Fig. 5: Three clusters, each with two relations. 


4 Extensions of Meta-MapReduce 

We have presented Meta-MapReduce framework for differ¬ 
ent and identical locations of data and mappers-reducers. 
However, some extensions are required to use Meta- 
MapReduce for geographically distributed data processing, 

7. The replication rate 110 [ of a mapping schema is the average 
number of key-value pairs Tor each input. 


The following three steps are required for obtaining final 
outputs using an execution of Meta-MapReduce over G- 
Hadoop and Hierarchical MapReduce. 

STEP 1 Mappers at each cluster process input data accord¬ 
ing to an assigned job and provide (key, value) pairs, 
where a value is the size of an assigned input. 

For example, in Figure[5] a mapper at Cluster 1 provides 
outputs of the form of (bi, | a* |) or (bi, |ci|). 


























































Step 2 Reducers at each cluster provide partial outputs by 
following an assigned mapping schema, and partial 
outputs, which contain only metadata, are transferred 
to one of the clusters, which will provide final outputs. 
For example, in case of equijoin, reducers at each cluster 
provide partial output tuples as (| a* |, bi , | a \) at Cluster 
1, {|d*|, b t , \ei\) at Cluster 2, and <|/*j, bi, \gi\) at Cluster 
3 (by following a mapping schema for equijoin). Partial 
outputs of Cluster 1 and Cluster 3 have to be transferred 
to one of the clusters, say Cluster 2, for obtaining the 
final output. 

STEP 3 A designated cluster for providing the final output 
processes all the outputs of the clusters by implement¬ 
ing the assigned job using Meta-MapReduce. Reducers 
that provide the final output call the original input 
data from all the clusters. 

For example, in equijoin, after receiving outputs of 
Cluster 1 and Cluster 3, Cluster 2 implements two 
iterations for joining tuples. In the first iteration, out¬ 
puts of Clusters 1 and 2 are joined (by following a 
mapping schema for equijoin), and in the second it¬ 
eration, outputs of Clusters 3 and the output of the 
previous iteration are joined at reducers. A reducer 
in the second iteration provides the final output as 
(|a»|, b 1: |cj|, \di\, |e*|, |/*|, \gi\) and calls all the orig¬ 
inal values of 10 ^, |cj|, |dj|, |ej|, |/i|, and \gf for provid¬ 
ing the desired output, as suggested in Section |A2| 

Communication cost analysis. In Figure [5j we are perform¬ 
ing equijoin in three clusters, and assuming that data is 
available at the site of mappers in each cluster. In addition, 
we consider that each value takes two units size; hence, any 
tuple, for example, (at, bi), has size 4 units. 

First, each of the clusters performs an equijoin within 
the cluster using Meta-MapReduce. Note that using Meta- 
MapReduce, there is no need to send any tuple from the 
map phase to the reduce phase within the cluster, while G- 
Hadoop and Hierarchical MapReduce do data transfer from 
the map phase to the reduce phase, and hence, results in 76 
units of communication cost. Moreover, in G-Hadoop and 
Hierarchical MapReduce, the transmission of two tuples 
((a 3 , b 2 ), (a 4 , b 2 )) of U, one tuple ((b 2 , c 2 )) of V, two tuples 
(( d 2 ,b 2 ), ( d 3 ,b 3 )) of W, two tuples ((b 2 ,e 3 ), ( 64 , e 4 )) of X, 
two tuples ((/ 2 , b 5 ), (f 3 , b 6 )) of Y, and one tuple (( 67 , g 3 )) of 
Z from the map phase to the reduce phase is useless, since 
they do not participate in the final output. 

After computing outputs within the cluster, metadata of 
outputs ( i.e ., size of tuples associated with key &i and key 
b 2 ) is transmitted to Cluster 2. Here, it is important to note 
that tuples with value b \ provide final outputs. Using Meta- 
MapReduce, we will not send the complete tuples with 
value b 2 , hence, we also decrease the communication cost; 
while G-Hadoop and Hierarchical MapReduce send all the 
outputs of the first and third clusters to the second cluster. 
After receiving outputs from the first and the third clusters, 
the second cluster performs two iterations as mentioned 
previously, and in the second iteration, a reducer for key b\ 
provides the final output. Following that the communication 
cost is only 36 units. 

On the other hand, transmission of outputs with data 
from the first cluster and the third cluster to the second 


cluster and performing two iterations result in 132 units 
communication cost. Therefore, G-Hadoop and Hierarchical 
MapReduce require 208 units communication cost while 
Meta-MapReduce provides the final results using 36 units 
communication cost. 

4.2 Large Size of Joining Values 

We have considered that sizes of joining values are very 
small as compared to sizes of all the other non-joining 
values. For example, in Figure [5] sizes of all the values 
of the attribute B are very small as compared to all the 
values of the attributes A and C. However, considering very 
small size of values of the joining attribute is not realistic. 
All the values of the joining attribute may also require a 
considerable amount of memory, which may be equal or 
greater than the sizes of non-joining values. In this case, it 
is not useful to send all the values of the joining attribute 
with metadata of non-joining attributes. Thus, we enhance 
Meta-MapReduce for handling a case of large size of joining 
values. 

We consider our running example of join of two relations 
X(A, B) and Y(B. C), where the size of each of b, is large 
enough such that the value of bi cannot be used as metadata. 
We use hash function to gain a short identifier (that is 
unique with high probability) for each bi. We denote H(bi) 
to be the hash value of the original value of bi . Here, Meta- 
MapReduce works as follows: 

STEP 1 For all the values of the joining attribute ( B ), use 
a hash function such that an identical b, in both of 
the relations has a unique hash value with a high 
probability, and bi and bj, i j, receive two different 
hash values with a high probability. 

STEP 2 For all the other non-joining attributes' values (val¬ 
ues corresponding to the attributes A and C), find 
metadata that includes size of each of the values. 

STEP 3 Perform the task using Meta-MapReduce, as fol¬ 
lows: (i) Users send hash values of joining attributes 
and metadata of the non-joining attributes. For exam¬ 
ple, a user sends hash value of bi ( H(bi )) and the 
corresponding metadata (i.e., size) of values a, or c, to 
the site of mappers, (it) A mapper processes an assigned 
tuples and provides intermediate outputs, where a key 
is H(bi) and a value is |a,| or |cj|. (iii) Reducers call 
all the values corresponding to a key (hash value), and 
if a reducer receives metadata of a; and c,, then the 
reducer calls the original input data and provides the 
final output. 

Note that there may be a possibility that two different 
values of the joining attribute have an identical hash 
value; hence, these two values are assigned to a reducer. 
However, the reducer will know these two different 
values, when it will call the corresponding data. The 
reducer notifies the master process, and a new hash 
function is used. 

Theorem 3 (The communication cost) Using Meta- 

MapReduce for the problem of join where values of joining 
attributes are large, the communication cost for the problem of 
join of two relations is at most 6?i ■ log m + h(c + w) bits, where 
n is the number of tuples in each relation, m is the maximal 
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number of tuples in two relations, h is the number of tuples that 
actually join, and w is the maximum required memory for a tuple. 

Proof. The maximal number of tuples having different 
values of a joining attribute in all relations is to, which is 
upper bounded by 2 n; hence, a mapping of hash function of 
to values into to 3 values will result in a unique hash value 
for every of the m keys with a high probability. Thus, we 
use at most 3 • log m bits for metadata of a single value, and 
hence, at most 6n • log m bits are required to move metadata 
from the user's site to the site of mappers-reducers. Since 
there are at most h tuples join and the maximum size of 
a tuple is w, we need to transfer at most he and at most 
hw bits from the map phase to the reduce phase and from 
the user's site to the reduce phase, respectively. Hence, the 
communication cost is at most 6 n ■ log m + h(c + w) bits. ■ 

4.3 Multi-round Iterations 

We show how Meta-MapReduce can be incorporated in a 
multi-round MapReduce job, where values of joining at¬ 
tributes are also large as the value of non-joining attributes. 
In order to explain, the working of Meta-MapReduce in 
a multi-iterative MapReduce job, we consider an exam¬ 
ple of join of four relations U(A,B,C,D), V(A,B,D,E), 
W ( I). E, F ), and X(F, G , //), and perform the join opera¬ 
tion using a cascade of two-way joins.. 

STEP 1 Find dominating attributes in all the relations. An 
attribute that occurs in more than one relation is called 
a dominating attribute 1151. 

For example, in our running example, attributes A, B, 
D, E, and F are dominating attributes. 

STEP 2 Implement a hash function over all the values of 
dominating attributes so that all the identical values of 
dominating attributes receive an identical hash value 
with a high probability, and all the different values 
of dominating attributes receive different hash values 
with a high probability. 

For example, identical values of eq, b ir di, e^, and f, 
receive an identical hash value, and any two values 
di and dj, such that % f j, probably receive different 
hash values (a similar case exists for different values of 
attributes B, D, E, F). 

STEP 3 For all the other non-dominating joining attributes' 
(an attribute that occurs in only one of the relations) 
values, we find metadata that includes size of each of 
the values. 

STEP 4 Now perform 2-way cascade join using Meta- 
MapReduce and follow a mapping schema according 
to a problem for assigning inputs (i.e., outputs of the 
map phase) to reducers. 

For example, in equijoin example, we may join relations 
as follows: first, join relations U and V, and then join 
the relation W to the outputs of the join of relations U 
and V. Finally, we join the relation X to outputs of the 
join of relations U, V, and W. Thus, we join the four 
relations using three iterations of Meta-MapReduce, 
and in the final iteration, reducers call the original 
required data. 

Example: Following our running example, in the first 
iteration, a mapper produces (F[(at), [H(bi), \ci\, H(di)]} 
after processing a tuple of the relation U and 


(H(cii), [H{bi),H(di), H(et)]) after processing a tuple of the 
relation V (where ff(cq) is a key). A reducer corresponding 
to H(a,i) provides ( H(a i ),H(b j ), \c k \, H(di), H(e z )) as out¬ 
puts. 

In the second iteration, a mapper pro¬ 
duces (H(d i ),[H(a i ),H(b j ),\c k \,H(e z )]) and 

(H(di), [Hjef), H(fi)]} after processing outputs of the 
first iteration and the relation W, respectively. Reducers 
in the second iteration provide output tuples by joining 
tuples that have an identical H(di). In the third iterations, a 
mapper produces ( H(f z ), [Hjaf), H(bi), \a\, H(di), H(a)]} 
or ( H(fi), \hi\ ]), and reducers perform the final join 

operations. A reducer, for key H(fi), receives \gf and 
\hi\ from the relation X and output tuples of the second 
iteration, provides the final output by calling original input 
data from the location of user. 

Theorem 4 (The communication cost) Using Meta- 

MapReduce for the problem of join where values of joining 
attributes are large, the communication cost for the problem of 
join of k relations, each of the relations with n tuples, is at most 
3 knp ■ log m + h(c + w ) bits, where n is the number of tuples in 
each relation, m is the maximal number of tuples in k relations, p 
is the maximum number of dominating attributes in a relation, h 
is the number of tuples that actually join, and w is the maximum 
required memory for a tuple. 

Proof. According to Theorem [3j at most 3 • log m bits for 
metadata are required for a single value; hence, at most 
3 knp ■ log m bits are required to move metadata from the 
user's site to the site of mappers-reducers. Since at most h 
tuples join and the maximum size of a tuple is w, at most he 
and at most hw bits from the map phase to the reduce phase 
and from the user's site to the reduce phase, respectively, 
are transferred. Hence, the communication cost is at most 
3 knp ■ log m + h(c + w) bits. ■ 

5 Versatility of Meta-MapReduce 

Meta-MapReduce decreases the amount of data to be trans¬ 
ferred to the remote site and intermediate data, if the final 
output does not depend on all the inputs. Especially, the 
problems, where the amount of intermediate data is much 
larger than the input data to the map phase and all the 
inputs do not participate in the final output, fit well in the 
context of Meta-MapReduce. In this section, we provide 
two common problems that can be solved using Meta- 
MapReduce. 

fc-NN Problem using Meta-MapReduce. Problem statement: 
k- nea rest-neighbors (fc-NN) problem [161, p7] tries to find 
k- nea rest-ne i ghbo rs of a given object. Two relations R of to 
tuples and S of n tuples are inputs to the fc-NN problem, 
where m < n. For example, relations R and S may contain 
a list of cities with full description of the city, images of 
places to visit in the city. Following that a solution to the fc- 
NN problem finds fc cities from the relation S for each city 
of the relation R in a manner that the distance between two 
cities (the first city belongs to R and the second city belongs 
to S) is minimum, and hence, km pairs are produced as 
outputs. A basic approach to find fc-NN is given in 116) that 
uses two iterations of MapReduce, where the first iteration 
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provides local fc-NN; and the second iteration provides the 
global fc-NN for each tuple of R. 

Communication cost analysis: The communication cost is 
the size of all the tuples of R and S that are required to move 
to the location of mappers-reducers, and then, tuples from 
the map phase to the reduce phase in the two iterations. 
If k < m, then it is communication in-efficient to move 
all the tuples of S from the user's location to the location 
of mappers and from the map phase to the reduce phase. 
Hence, by sending only metadata of each tuple a lot of 
communication can be avoided. 



Fig. 6 : A graph of a social network. 


Shortest Path Findings on a Social Networking Graph us¬ 
ing Meta-MapReduce. Problem statement: Consider a graph 
of a social network, where a node represents either a person 
or a photo, and an edge exists between two persons if they 
are friends or between a person and a photo if the person 
is tagged in the photo; however, there is no edge between 
two photos; see Figure [ 6 ] The implementation of a shortest 
path algorithm on the graph results in paths between two 
persons i and j with common information (which exist on 
the paths) between the two persons i and j. 

Communication cost analysis: We want to find a shortest 
path between persons Pi and P§, refer to Figure [ 6 ] A short¬ 
est path algorithm will provide a path Pi-P 2 -P^,-Pici-P^- 
Pq or P\ - P -2 - f \r Pic. ; - / 4 - / (j and show the common things 
between every two persons on the path and the photo, 
Pici, as a connection between and If. Note that in this 
case it is communication in-efficient to send all the photos 
and common information between every two friends of the 
graph to the site of mappers, because most of the nodes are 
removed in the final output. Hence, it is beneficial to send 
metadata of people and photos to the location of mappers 
and to process metadata to find a shortest path between 
two person at the reduce phase. In Figure [ 6 ] reducers 
that provide the final output call information of person 
(/'i. If. If. P 5 , Pq) and photo (Pici). Consequently, there is 
no need to send P*cg and P 1 C 3 . 

6 Conclusion 

Impacts of the locations of data and mappers-reducers, 
the number of iterations, and the reducer capacity (the 
maximum size of inputs that a reducer can hold) on the 
amount of data to be transferred to the location of (remote) 
computations are investigated. Based on the investigation, 
we found that it is not required to send the whole data to the 
location of computation if all the inputs do not participate in 
the final output. Thus, we proposed a new algorithmic tech¬ 
nique for MapReduce algorithms, called Meta-MapReduce. 
Meta-MapReduce decreases a huge amount of data to be 
transferred across clouds by transferring metadata for a 


data field, rather than the field itself, metadata that is 
exponentially smaller, and processes metadata at the map 
phase and the reduce phase. We demonstrated the impact 
of Meta-MapReduce for solving problems of equijoin, /,:- 
nearest-neighbors finding, and shortest path finding. In the 
simplest case of equijoin of two relations, Meta-MapReduce 
requires at most 2 nc + h(c + w ) bits to be transferred to 
the location of computation, while the classical MapReduce 
requires at most 4 nw bits to be transferred to the location of 
computation. Also, we suggest a way to incorporate Meta- 
MapReduce for processing geographically distributed data 
and for executing a multi-round MapReduce computation. 
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